Enhance arXiv retrieval robustness: add retry logic for HTTP errors#261
Open
dekrt wants to merge 3 commits into
Open
Enhance arXiv retrieval robustness: add retry logic for HTTP errors#261dekrt wants to merge 3 commits into
dekrt wants to merge 3 commits into
Conversation
Make arXiv retrieval resilient to transient 5xx API failures in `calculate-and-send`
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves arXiv batch retrieval resilience by retrying additional transient HTTP errors (not just 429), skipping batches after exhausting retries, and adding tests to validate retryable vs non-retryable behavior.
Changes:
- Add a shared set of retryable arXiv HTTP statuses and use it in
_retrieve_raw_papers. - Skip a batch after max retries for retryable status codes and continue processing.
- Add pytest coverage for retryable (503) and non-retryable (400) HTTP error handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/retriever/test_arxiv_retriever.py | Adds tests covering skipped batches after retryable errors and raising on non-retryable HTTP errors. |
| src/zotero_arxiv_daily/retriever/arxiv_retriever.py | Expands retry logic to multiple transient HTTP statuses and adds batch-skipping behavior after retries. |
Comment on lines
+157
to
+165
| elif status in RETRYABLE_ARXIV_STATUSES: | ||
| logger.warning( | ||
| f"Skipping batch {i // 20} after {max_batch_retries} retries due to arXiv API {status}" | ||
| ) | ||
| break | ||
| else: | ||
| raise | ||
| if not batch_succeeded: | ||
| logger.warning(f"No papers retrieved for batch {i // 20}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request improves the robustness of the arXiv paper retrieval process by handling additional retryable HTTP errors and adds tests to ensure correct behavior in these scenarios. The main changes are the introduction of a set of retryable status codes, enhanced error handling logic during batch retrieval, and new tests for these cases.
Error handling improvements:
RETRYABLE_ARXIV_STATUSESinarxiv_retriever.pyto define which HTTP status codes (429, 500, 502, 503, 504) should trigger a retry when communicating with the arXiv API._retrieve_raw_papersmethod to retry on any status inRETRYABLE_ARXIV_STATUSES, log appropriate warnings, and skip batches after maximum retries, ensuring only truly unrecoverable errors are raised.Testing improvements:
pytestas a test dependency for enhanced testing capabilities.